Paraphrasing with Search Engine Query Logs
نویسندگان
چکیده
This paper proposes a method that extracts paraphrases from search engine query logs. The method first extracts paraphrase query-title pairs based on an assumption that a search query and its corresponding clicked document titles may mean the same thing. It then extracts paraphrase query-query and title-title pairs from the query-title paraphrases with a pivot approach. Paraphrases extracted in each step are validated with a binary classifier. We evaluate the method using a query log from Baidu1, a Chinese search engine. Experimental results show that the proposed method is effective, which extracts more than 3.5 million pairs of paraphrases with a precision of over 70%. The results also show that the extracted paraphrases can be used to generate high-quality paraphrase patterns.
منابع مشابه
Query Topic Classification and Sociology of Web Query Logs
In the paper, the objects, tasks, and a general procedure of the sociological analysis of Web search engine query logs are described and illustrated by a methodologically complete study of the cross-nation search image changes based on two-year spaced query logs of the national search audience.
متن کاملPosition Paper: Access to Query Logs – An Academic Researcher’s Point of View
Academic researchers have very limited access to query logs of major web search engines. Studying and analyzing large-scale query logs is essential for advancing Web IR. We propose setting up review boards with clear rules for appropriate conduct, and allowing researchers access to logs within this framework.
متن کاملQuantifying Asymmetric Semantic Relations from Query Logs by Resource Allocation
In this paper we present a bipartite-network-based resource allocation(BNRA) method to extract and quantify semantic relations from large scale query logs of search engine. Firstly, we construct a queryURL bipartite network from query logs of search engine. By BNRA, we extract asymmetric semantic relations between queries from the bipartite network. Asymmetric relation indicates that two relate...
متن کاملLearning to Rank Query Recommendations by Semantic Similarity
The web logs of the interactions of people with a search engine show that users often reformulate their queries. Examining these reformulations shows that recommendations that precise the focus of a query are helpful, like those based on expansions of the original queries. But it also shows that queries that express some topical shift with respect to the original query can help user access more...
متن کاملSemantic microaggregation for the anonymization of query logs using the open directory project
Web search engines gather information from the queries performed by the user in the form of query logs. These logs are extremely useful for research, marketing, or profiling, but at the same time they are a great threat to the user’s privacy. We provide a novel approach to anonymize query logs so they ensure user k-anonymity, by extending a common method used in statistical disclosure control: ...
متن کامل